Fusion of audio and video modalities for detection of acoustic events

نویسندگان

  • Taras Butko
  • Andrey Temko
  • Climent Nadeu
  • Cristian Canton-Ferrer
چکیده

Detection of acoustic events (AED) that take place in a meeting-room environment becomes a difficult task when signals show a large proportion of temporal overlap of sounds, like in seminar-type data, where the acoustic events often occur simultaneously with speech. Whenever the event that produces the sound is related to a given position or movement, video signals may be a useful additional source of information for AED. In this work, we aim at improving the AED accuracy by using two complementary audio-based AED systems, built with SVM and HMM classifiers, and also a video-based AED system, which employs the output of a 3D video tracking algorithm to improve detection of steps. Fuzzy integral is used to fuse the outputs of the three classification systems in two stages. Experimental results using the CLEAR'07 evaluation data show that the detection rate increases by fusing the two audio information sources, and it is further improved by including video information.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detection of Overlapped Acoustic Events using Fusion of Audio and Video Modalities

Acoustic event detection (AED) may help to describe acoustic scenes, and also contribute to improve the robustness of speech technologies. Even if the number of considered events is not large, that detection becomes a difficult task in scenarios where the AEs are produced rather spontaneously and often overlap in time with speech. In this work, fusion of audio and video information at either fe...

متن کامل

Acoustic Event Detection Based on Feature-Level Fusion of Audio and Video Modalities

Acoustic event detection (AED) aims at determining the identity of sounds and their temporal position in audio signals. When applied to spontaneously generated acoustic events, AED based only on audio information shows a large amount of errors, which are mostly due to temporal overlaps. Actually, temporal overlaps accounted for more than 70% of errors in the realworld interactive seminar record...

متن کامل

Non - Speech Acoustic Event Detection Using

Non-speech acoustic event detection (AED) aims to recognize events that are relevant to human activities associated with audio information. Much previous research has been focused on restricted highlight events, and highly relied on ad-hoc detectors for these events. This thesis focuses on using multimodal data in order to make non-speech acoustic event detection and classification tasks more r...

متن کامل

Feature-Level Decision Fusion for Audio-Visual Word Prominence Detection

Common fusion techniques in audio-visual speech processing operate on the modality level. I.e. they either combine the features extracted from the two modalities directly or derive a decision for each modality separately and then combine the modalities on the decision level. We investigate the audio-visual processing of linguistic prosody, more precisely the extraction of word prominence. In th...

متن کامل

A fusion scheme of visual and auditory modalities for event detection in sports video

In this paper, we propose an effective fusion scheme of visual and auditory modalities to detect events in sports video. The proposed scheme is built upon semantic shot classification, where we classify video shots into several major or interesting classes, each of which has clear semantic meanings. Among major shot classes we perform classification of the different auditory signal segments (i....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008